Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Episode iterator upgrades #216

Merged
merged 18 commits into from
Oct 9, 2019
Merged

Episode iterator upgrades #216

merged 18 commits into from
Oct 9, 2019

Conversation

erikwijmans
Copy link
Contributor

@erikwijmans erikwijmans commented Sep 28, 2019

Motivation and Context

Add some upgrades to the episode iterator functionality. The primary thing this adds is the ability to shuffle scenes after some number of steps are taken. If you are shuffling after some number of episodes have ended, you end up shuffling too often at the start of training and not enough towards the end. This value is set to 10k by default. I consider this to be a "high but sane" default value. Not so low that it will dramatically slow-down training, but not so high that scenes will never be swapped.

Also randomizes the shuffle switch values to be within [0.8, 1.2] of the configured value. This helps not have all workers swap the same point for the shuffle by steps mode.

Adds an initial shuffle to the list of episodes as otherwise the same scene will always be initially loaded (which is bad).

Sets the max steps per scene to a high but sane default value. We generated a huge number of episodes per scene for training (50k IIRC) and you basically never switch scenes toward the end of training with that.

Also makes the scene order get shuffled by default as this makes sense for the default (people rarely change the defaults, so they should be set reasonably).

How Has This Been Tested

Via the tests

Types of changes

  • New feature (non-breaking change which adds functionality)

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Sep 28, 2019
@mathfac
Copy link
Contributor

mathfac commented Oct 1, 2019

@JasonJiazhiZhang feel free to comment on the PR.

Copy link
Contributor

@mathfac mathfac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this to master. Checked the logic of EpisodeIterator it's not so easy to follow. Maybe in future we would like it to have separate easier to understand subclasses of EpisodeIterator.
Requested some changes.

self._rep_count = 0
self._step_count = 0

self._switch_scene_if()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_episode is not updated based on results of self._switch_scene_if().

def _set_shuffle_intervals(self):
if self.max_scene_repetition_episodes > 0:
self._max_rep_episode = random.randint(
int(0.8 * self.max_scene_repetition_episodes),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create repetition_rand_interval = 0.2 param for EpisodeIterator and do:

            self._max_rep_step = random.randint(
                int((1 - self.repetition_rand_interval) * self.max_scene_repetition_steps),
                int((1 + self.repetition_rand_interval) * self.max_scene_repetition_steps),
            )

Or even create util method:

@static
def _randomize_value(value, interval):
    return random.randint(
                int((1 - interval) * value),
                int((1 + interval) * value),
            )

@@ -213,6 +213,11 @@ def _update_step_stats(self) -> None:
if self._past_limit():
self._episode_over = True

if self.episode_iterator is not None and isinstance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While self.episode_iterator mentioned as Optional I don't see Env functioning without it here. Maybe check for subclass of EpisodeIterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate PR probably makes sense for changing that functionality (dataset is also marked as optional).

do_switch = True

if do_switch:
self._shuffle_iterator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for self.shuffle == True, here when we shuffling.

Copy link
Contributor Author

@erikwijmans erikwijmans Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. _shuffle_iterator is used both to initiate a scene switch and to just the shuffle the episodes. self.shuffle decides whether or not to shuffle the episode order on cycle or on load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange as there was no option to enable cycling without episode shuffling.

Copy link
Contributor Author

@erikwijmans erikwijmans Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cycle logic doesn't rely on the shuffle

:param num_episode_sample: number of episodes to be sampled. :py:`-1`
for no sampling.
"""
self._repetition_rand_interval = 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move repetition_rand_interval to init argument with default value, otherwise no option to turn it off.

Copy link
Contributor

@mathfac mathfac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on following up the comments. Last item:
We are getting shuffle automatically if max_counts start working.
Maybe more logical will be to jump to next scene episodes without shuffling if shuffle isn't enabled.

Copy link
Contributor

@JasonJiazhiZhang JasonJiazhiZhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mathfac for looping me in and thank you @erikwijmans for implementing this! I have some minor comments. Overall looks great to me!

habitat/core/dataset.py Outdated Show resolved Hide resolved
habitat/core/dataset.py Outdated Show resolved Hide resolved
else:
self._max_rep_step = None

def _switch_scene_if(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here: would it be switching to frequently at the later stage of training? with potentially less than 100 episode per scene switch? Will it help to incorporate both count schemes, like a logical AND?

Copy link
Contributor Author

@erikwijmans erikwijmans Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuffling on steps is consistent from an optimization standpoint, you swap scenes every N parameters updates, which is why I like it :)

I don't think you can switch scenes "too often". Ideally, we'd just randomly sample a new episode irrespective of the scene it is in, but this incurs the non-trivial cost of swapping the scene way too often.

habitat/core/dataset.py Outdated Show resolved Hide resolved
habitat/core/dataset.py Show resolved Hide resolved
@mathfac
Copy link
Contributor

mathfac commented Oct 9, 2019

Pulled PR to understand the logic of changes. As result added some tests and small fix.

@@ -376,9 +377,6 @@ def _switch_scene(self) -> None:

if len(grouped_episodes) > 1:
# Ensure we swap by moving the current group to the end
if self.shuffle:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this shuffle deleted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, shuffles are already happening at the beginning and on each cycle. That's why this shuffle is redundant and can be omitted with no test failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess both logics are valid. Either you cycle through all scenes in a predetermined but random order, or you randomly draw a new scene from the set of other scenes. One doesn't seem inherently better than the other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that it's already in predetermined but random order because of self.shuffle in other places. So there is no need to shuffle again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep yep, it's just different ideas of how scenes should be selected on a swap. I am fine with this way if it makes more sense to you, both make sense to me.

@erikwijmans erikwijmans merged commit 600a1ef into master Oct 9, 2019
@erikwijmans erikwijmans deleted the episode-iterator-upgrades branch October 9, 2019 19:39
@mathfac mathfac added this to the Better engineering milestone Oct 9, 2019
dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request Jul 8, 2024
* Episode iterator upgrades

* Make default slightly lower

* Fix env.py

* fix is true

* Update docstring

* Fix formatting

* Turn off shuffle!

* Changes

* Fix test and incorperate comments

* Don't always shuffle on scene swap

* Rename and add doc string

* Cleanup

* Simplify shuffle logic

* Fixed small bug with initial rep_count value, added more tests, moved comments to assert messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants